Multi-armed Bandit Mechanism with Private Histories

نویسندگان

Chang Liu

Qingpeng Cai

Yukui Zhang

چکیده

The fundamental challenge in bandit problem is the trade off between exploration and exploitation. To minimize the regret in a long period, an algorithm has to explore by actually choosing seemingly suboptimal arms so as to gather more information about them. The exploration obviously has higher short-term regrets. In recommendation of new items, the lifecycles of these items are remarkably short. We try to gather information as plenty as possible in an exploration process and expect we can get rewards in the following exploitation, but the gains are tiny and some newer items come in and next exploration should be start. We must increase the intensity of exploration so as to gather information quickly, but this will draw more regrets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Algorithms for Differentially Private Multi-Armed Bandits

We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist (ǫ, δ) differentially private variants of Upper Confidence Bound alg...

متن کامل

Characterizing Truthful Multi-armed Bandit Mechanisms

We consider a multi-round auction setting motivated by payper-click auctions for Internet advertising. In each round the auctioneer selects an advertiser and shows her ad, which is then either clicked or not. An advertiser derives value from clicks; the value of a click is her private information. Initially, neither the auctioneer nor the advertisers have any information about the likelihood of...

متن کامل

An Incentive Compatible Multi-Armed-Bandit Crowdsourcing Mechanism with Quality Assurance

Consider a requester who wishes to crowdsource a series of identical binary labeling tasks from a pool of workers so as to achieve an assured accuracy for each task, in a cost optimal way. The workers are heterogeneous with unknown but fixed qualities and moreover their costs are private. The problem is to select an optimal subset of the workers to work on each task so that the outcome obtained...

متن کامل

A quality assuring multi-armed bandit crowdsourcing mechanism with incentive compatible learning

We develop a novel multi-armed bandit (MAB) mechanism for the problem of selecting a subset of crowd workers to achieve an assured accuracy for each binary labelling task in a cost optimal way. This problem is challenging because workers have unknown qualities and strategic costs.

متن کامل